Pacific Northwest National Laboratory - CORE

VAST 2008 Challenge

Mini Challenge 1: Wiki Editors

Bob Baddeley, Pacific Northwest National Laboratory, bob.baddeley@pnl.gov

Student team: NO

Tool:

CORE is a web application framework written at PNNL primarily by me using open source software like MySQL and PHP that allows users to easily interface with underlying data and define different types of views easily. Pulling data into CORE tables is easily done, and defining the views of the data is done with a simple interface that allows for complex joins and styles. Views can be maps, timelines, and tables, and can be editable, allowing the user to change the data if necessary. The views can interact with each other, giving the user the ability to see a map and table and timeline side by side, and all viewing the same data in different ways.

Two Page Summary: NO

Answers:

Wiki-1: What are the factions represented in the edit pages and who are its members? In other words, describe the groups and their members based on their editing changes.  

GROUP

Names

Objective Editors 66.66.125.x; Adriano; Afrox; Cristofor; Estirabot; Gustava; Honoria; Paintman; Ricarda; Salvadora; Socorro
Significant Contributors Agustin; Amado; DailosTamanca; David Moron; Edemir; Rm99; Sara; RyogaNica; Savanna; VictoriaV
  Significant contributors split into factions:
Faction A VictoriaV; Savanna; Amado;RyogaNica;
Faction B Rm 99 ; DailosTamanca; Sara; Agustin; David Moron; Edemir
Note: Only people with significant numbers of contributions are included in this list (5 or more edits or comments) or whose individual contributions were significant. Many more could be added that only made one or two edits, but their involvement was more casual.

Detailed Answer:

I started by first importing the data (both edits and discussion) into a CORE database table. This took some cleaning, especially with the discussion pages, and it was evident early that the times for the edits were off from the discussion, so associating the discussion with specific edits over time was impossible. Most of the cleaning was done simply by using Notepad++ and writing a lot of regular expressions to find and replace.

The included video is a walkthrough of using CORE and views on data tables to filter and change the display of the data to get a better understanding.

The first step once the data were normalized and inserted was to create a view of all the data and get a feel for it. Immediately I noticed that a lot of records were (xxx,xxx bytes) with no other useful content, so I added a filter to hide them. Then I sorted the list by user and went down the table identifying users that had contributed a lot. By skimming the edits, I was able to tell whether they were objective editors who didn't make many substantive changes but made grammar corrections, contributors who did a lot of work to the content, or just short-term single-edit users who were often vandals.

To pare down the table more, I noticed that all edits by users that have IP addresses were removed later as vandalism, so I added filters to hide anything where the user was *.x or where the edit contained *.x. This reduced my table significantly. This image shows the creation of the view, where the user picks the columns they want to see, filters out things by using conditions, and can create styles to render specific values differently.

And this is the result of that view. All table views automatically are sortable by clicking on the column names. They are also filterable, with filters being additive. That is, I could select all the edittype 'edit' that happened with a datetime >= "2007-10-01 00:00" by user "Amado". This is a quick way of taking the larger view and temporarily showing a smaller subset. For more permanent or complex queries, the user can go back and edit the view itself.

Reading through again, I confirmed that my significant contributors were correct. There are other contributors whose edits were accepted, but their presence was so short, and the data available about the content of their edits so limited, that ultimately they could not be associated with any faction.

Finally, I added a filter to my view to only show edits and discussions by the significant contributors. Sorting the view by time, I was essentially able to see the interactions between only these members in the order they happened. With this list I read through again and got a basic idea of the factions. I added styles to my view so that contributions by each faction were a different color. Then I went back to my view to confirm. The colors changed my view of the characters. I originally had RyogaNica and Edemir as a third faction, but in looking over their relationship with each other and the other factions, changed my mind.

Note in this diagram the addition of the condition that limits it to specific users, and the addition of styles that describe the background colors of the cells based on their value.

And this is the result of that view, with members of one faction in shades of yellow, and members of the other faction in shades of green.

In the image below, you can see another conflict between the factions. You'll also notice Adalberto is appearing. Just showing the characters of interest removed some of the context, and led me to believe the response of an individual was in response to the row before it, when in fact there were a few things in between. Adding back in all the edits, while still coloring the ones of interest resolved this issue.

VictoriaV is clearly the most active and opinionated and does not get along well with most of the others. Rm99 is the same way. However, VictoriaV doesn't make changes after Savanna or Amado or RyogaNica, and reverts edits back to their versions. So I would put those four together.

The second faction is clearly in dispute with VictoriaV's faction. There are wars between the two frequently, leading to freezing of the page and violations of the three revert rule. The members of this group edit together sometimes and rarely step on each other's toes.

Wiki-2:  Is the Paraiso movement involved in violent activities? 

NO

List of wiki edits providing evidence

It's more of a lack of evidence...

Short Answer:

The content of the article, the discussion page, and the edits are all non-violent. There are no threats, no single person or people are consistently doing vandalism, and the most heated debates have only been about references and point of view. There are clearly some factions of people, and some of them are more vocal than others, but none have said anything about violence.